Skip to content

fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 50 commits intomasterfrom
ygree/llmobs-systest-fixes
Apr 1, 2026
Merged

fix(llmobs): openai-java payload mapping for responses, tool metadata, and prompt tracking#10644
gh-worker-dd-mergequeue-cf854d[bot] merged 50 commits intomasterfrom
ygree/llmobs-systest-fixes

Conversation

@ygree
Copy link
Copy Markdown
Contributor

@ygree ygree commented Feb 19, 2026

What Does This Do

Aligns OpenAI Java LLMObs span payloads with expected intake/system-test schema by:

  • Adding/filling missing LLMObs tags:
    • _ml_obs_tag.integration
    • _ml_obs_tag.source
    • _ml_obs_tag.ddtrace.version
    • _ml_obs_tag.error
    • _ml_obs_tag.error_type
  • Ensuring model_name (and stable placeholder output where applicable) is set on error paths for
    chat/completions/embeddings/responses.
  • Expanding Responses instrumentation:
    • prompt tracking (input.prompt, variables, chat_template)
    • tool definition extraction (tool_definitions)
    • tool call/result extraction across function/custom/MCP outputs
    • metadata normalization (stream, tool_choice, text.verbosity, etc.)
  • Updating LLMObs mapper payload shape:
    • writes _dd map with span/trace ids
    • nests error fields under meta.error
    • supports map-based LLM input serialization (messages + prompt)
    • remaps tool_definitions into meta.

Motivation

OpenAI/LLMObs system tests exposed schema and tag mismatches in Java payloads (especially response spans, tool metadata, error mapping, and prompt tracking structure). This change brings Java output in line with expected LLMObs intake contract and behavior.

Additional Notes

DataDog/dd-apm-test-agent#280
DataDog/system-tests#6364

Contributor Checklist

Jira ticket: [PROJ-IDENT]

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

@ygree ygree self-assigned this Feb 19, 2026
@ygree ygree added comp: mlobs ML Observability (LLMObs) type: bug Bug report and fix labels Feb 19, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Feb 19, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master ygree/llmobs-systest-fixes
git_commit_date 1773939812 1774929373
git_commit_sha 5580c61 d7d4866
release_version 1.61.0-SNAPSHOT~5580c61ac4 1.60.0-SNAPSHOT~d7d4866358
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1774931196 1774931196
ci_job_id 1553021997 1553021997
ci_pipeline_id 105184639 105184639
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-iclaexns 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-iclaexns 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.066 s) : 0, 1065982
Total [baseline] (11.034 s) : 0, 11034499
Agent [candidate] (1.067 s) : 0, 1067232
Total [candidate] (11.157 s) : 0, 11157281
section appsec
Agent [baseline] (1.251 s) : 0, 1251474
Total [baseline] (11.214 s) : 0, 11213666
Agent [candidate] (1.252 s) : 0, 1251814
Total [candidate] (11.214 s) : 0, 11214115
section iast
Agent [baseline] (1.236 s) : 0, 1235512
Total [baseline] (11.358 s) : 0, 11357729
Agent [candidate] (1.233 s) : 0, 1232603
Total [candidate] (11.389 s) : 0, 11388723
section profiling
Agent [baseline] (1.188 s) : 0, 1188429
Total [baseline] (11.125 s) : 0, 11125225
Agent [candidate] (1.188 s) : 0, 1187616
Total [candidate] (11.08 s) : 0, 11080254
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.066 s -
Agent appsec 1.251 s 185.492 ms (17.4%)
Agent iast 1.236 s 169.53 ms (15.9%)
Agent profiling 1.188 s 122.447 ms (11.5%)
Total tracing 11.034 s -
Total appsec 11.214 s 179.167 ms (1.6%)
Total iast 11.358 s 323.231 ms (2.9%)
Total profiling 11.125 s 90.726 ms (0.8%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.067 s -
Agent appsec 1.252 s 184.582 ms (17.3%)
Agent iast 1.233 s 165.371 ms (15.5%)
Agent profiling 1.188 s 120.384 ms (11.3%)
Total tracing 11.157 s -
Total appsec 11.214 s 56.833 ms (0.5%)
Total iast 11.389 s 231.442 ms (2.1%)
Total profiling 11.08 s -77.027 ms (-0.7%)
gantt
    title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.205 ms) : 0, 1205
BytebuddyAgent [baseline] (633.286 ms) : 0, 633286
BytebuddyAgent [candidate] (633.527 ms) : 0, 633527
AgentMeter [baseline] (29.666 ms) : 0, 29666
AgentMeter [candidate] (29.55 ms) : 0, 29550
GlobalTracer [baseline] (259.607 ms) : 0, 259607
GlobalTracer [candidate] (260.293 ms) : 0, 260293
AppSec [baseline] (32.08 ms) : 0, 32080
AppSec [candidate] (31.964 ms) : 0, 31964
Debugger [baseline] (60.91 ms) : 0, 60910
Debugger [candidate] (60.797 ms) : 0, 60797
Remote Config [baseline] (594.453 µs) : 0, 594
Remote Config [candidate] (600.111 µs) : 0, 600
Telemetry [baseline] (8.094 ms) : 0, 8094
Telemetry [candidate] (8.795 ms) : 0, 8795
Flare Poller [baseline] (4.409 ms) : 0, 4409
Flare Poller [candidate] (4.337 ms) : 0, 4337
section appsec
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (661.863 ms) : 0, 661863
BytebuddyAgent [candidate] (660.807 ms) : 0, 660807
AgentMeter [baseline] (12.205 ms) : 0, 12205
AgentMeter [candidate] (12.134 ms) : 0, 12134
GlobalTracer [baseline] (259.041 ms) : 0, 259041
GlobalTracer [candidate] (258.918 ms) : 0, 258918
AppSec [baseline] (177.815 ms) : 0, 177815
AppSec [candidate] (178.969 ms) : 0, 178969
Debugger [baseline] (66.048 ms) : 0, 66048
Debugger [candidate] (65.561 ms) : 0, 65561
Remote Config [baseline] (652.511 µs) : 0, 653
Remote Config [candidate] (660.975 µs) : 0, 661
Telemetry [baseline] (8.295 ms) : 0, 8295
Telemetry [candidate] (8.349 ms) : 0, 8349
Flare Poller [baseline] (3.551 ms) : 0, 3551
Flare Poller [candidate] (4.42 ms) : 0, 4420
IAST [baseline] (24.237 ms) : 0, 24237
IAST [candidate] (24.303 ms) : 0, 24303
section iast
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.222 ms) : 0, 1222
BytebuddyAgent [baseline] (800.913 ms) : 0, 800913
BytebuddyAgent [candidate] (799.254 ms) : 0, 799254
AgentMeter [baseline] (11.578 ms) : 0, 11578
AgentMeter [candidate] (11.427 ms) : 0, 11427
GlobalTracer [baseline] (249.318 ms) : 0, 249318
GlobalTracer [candidate] (248.472 ms) : 0, 248472
AppSec [baseline] (26.646 ms) : 0, 26646
AppSec [candidate] (26.474 ms) : 0, 26474
Debugger [baseline] (70.282 ms) : 0, 70282
Debugger [candidate] (70.291 ms) : 0, 70291
Remote Config [baseline] (539.482 µs) : 0, 539
Remote Config [candidate] (541.683 µs) : 0, 542
Telemetry [baseline] (9.774 ms) : 0, 9774
Telemetry [candidate] (9.723 ms) : 0, 9723
Flare Poller [baseline] (3.563 ms) : 0, 3563
Flare Poller [candidate] (3.573 ms) : 0, 3573
IAST [baseline] (25.426 ms) : 0, 25426
IAST [candidate] (25.441 ms) : 0, 25441
section profiling
ProfilingAgent [baseline] (94.44 ms) : 0, 94440
ProfilingAgent [candidate] (93.874 ms) : 0, 93874
crashtracking [baseline] (1.183 ms) : 0, 1183
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (685.267 ms) : 0, 685267
BytebuddyAgent [candidate] (685.71 ms) : 0, 685710
AgentMeter [baseline] (9.024 ms) : 0, 9024
AgentMeter [candidate] (9.029 ms) : 0, 9029
GlobalTracer [baseline] (216.145 ms) : 0, 216145
GlobalTracer [candidate] (216.332 ms) : 0, 216332
AppSec [baseline] (32.523 ms) : 0, 32523
AppSec [candidate] (32.374 ms) : 0, 32374
Debugger [baseline] (65.841 ms) : 0, 65841
Debugger [candidate] (66.089 ms) : 0, 66089
Remote Config [baseline] (567.356 µs) : 0, 567
Remote Config [candidate] (565.814 µs) : 0, 566
Telemetry [baseline] (7.807 ms) : 0, 7807
Telemetry [candidate] (7.693 ms) : 0, 7693
Flare Poller [baseline] (4.331 ms) : 0, 4331
Flare Poller [candidate] (3.479 ms) : 0, 3479
Profiling [baseline] (95.016 ms) : 0, 95016
Profiling [candidate] (94.434 ms) : 0, 94434
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1060219
Total [baseline] (8.829 s) : 0, 8829299
Agent [candidate] (1.058 s) : 0, 1058299
Total [candidate] (8.844 s) : 0, 8843544
section iast
Agent [baseline] (1.232 s) : 0, 1232137
Total [baseline] (9.579 s) : 0, 9578833
Agent [candidate] (1.231 s) : 0, 1231212
Total [candidate] (9.55 s) : 0, 9549685
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.06 s -
Agent iast 1.232 s 171.918 ms (16.2%)
Total tracing 8.829 s -
Total iast 9.579 s 749.534 ms (8.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.058 s -
Agent iast 1.231 s 172.913 ms (16.3%)
Total tracing 8.844 s -
Total iast 9.55 s 706.141 ms (8.0%)
gantt
    title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.207 ms) : 0, 1207
BytebuddyAgent [baseline] (631.03 ms) : 0, 631030
BytebuddyAgent [candidate] (630.07 ms) : 0, 630070
AgentMeter [baseline] (29.378 ms) : 0, 29378
AgentMeter [candidate] (29.539 ms) : 0, 29539
GlobalTracer [baseline] (257.528 ms) : 0, 257528
GlobalTracer [candidate] (257.604 ms) : 0, 257604
AppSec [baseline] (32.081 ms) : 0, 32081
AppSec [candidate] (31.867 ms) : 0, 31867
Debugger [baseline] (59.916 ms) : 0, 59916
Debugger [candidate] (59.734 ms) : 0, 59734
Remote Config [baseline] (590.982 µs) : 0, 591
Remote Config [candidate] (585.146 µs) : 0, 585
Telemetry [baseline] (8.775 ms) : 0, 8775
Telemetry [candidate] (8.03 ms) : 0, 8030
Flare Poller [baseline] (3.565 ms) : 0, 3565
Flare Poller [candidate] (3.516 ms) : 0, 3516
section iast
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.196 ms) : 0, 1196
BytebuddyAgent [baseline] (799.787 ms) : 0, 799787
BytebuddyAgent [candidate] (799.511 ms) : 0, 799511
AgentMeter [baseline] (11.373 ms) : 0, 11373
AgentMeter [candidate] (11.405 ms) : 0, 11405
GlobalTracer [baseline] (248.079 ms) : 0, 248079
GlobalTracer [candidate] (248.357 ms) : 0, 248357
AppSec [baseline] (26.661 ms) : 0, 26661
AppSec [candidate] (26.539 ms) : 0, 26539
Debugger [baseline] (67.347 ms) : 0, 67347
Debugger [candidate] (68.264 ms) : 0, 68264
Remote Config [baseline] (536.997 µs) : 0, 537
Remote Config [candidate] (528.589 µs) : 0, 529
Telemetry [baseline] (11.364 ms) : 0, 11364
Telemetry [candidate] (10.096 ms) : 0, 10096
Flare Poller [baseline] (3.857 ms) : 0, 3857
Flare Poller [candidate] (3.649 ms) : 0, 3649
IAST [baseline] (25.45 ms) : 0, 25450
IAST [candidate] (25.466 ms) : 0, 25466
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master ygree/llmobs-systest-fixes
git_commit_date 1773939812 1774929373
git_commit_sha 5580c61 d7d4866
release_version 1.61.0-SNAPSHOT~5580c61ac4 1.60.0-SNAPSHOT~d7d4866358
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1774931590 1774931590
ci_job_id 1553021998 1553021998
ci_pipeline_id 105184639 105184639
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-cddcgd5l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-cddcgd5l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 20 metrics, 16 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.234 ms) : 1222, 1246
.   : milestone, 1234,
iast (3.242 ms) : 3199, 3284
.   : milestone, 3242,
iast_FULL (5.863 ms) : 5805, 5921
.   : milestone, 5863,
iast_GLOBAL (3.561 ms) : 3500, 3622
.   : milestone, 3561,
profiling (2.159 ms) : 2139, 2180
.   : milestone, 2159,
tracing (1.872 ms) : 1856, 1887
.   : milestone, 1872,
section candidate
no_agent (1.217 ms) : 1207, 1228
.   : milestone, 1217,
iast (3.361 ms) : 3317, 3406
.   : milestone, 3361,
iast_FULL (5.886 ms) : 5828, 5945
.   : milestone, 5886,
iast_GLOBAL (3.599 ms) : 3544, 3654
.   : milestone, 3599,
profiling (2.094 ms) : 2076, 2112
.   : milestone, 2094,
tracing (1.856 ms) : 1840, 1873
.   : milestone, 1856,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.234 ms [1.222 ms, 1.246 ms] -
iast 3.242 ms [3.199 ms, 3.284 ms] 2.007 ms (162.7%)
iast_FULL 5.863 ms [5.805 ms, 5.921 ms] 4.629 ms (375.1%)
iast_GLOBAL 3.561 ms [3.5 ms, 3.622 ms] 2.327 ms (188.5%)
profiling 2.159 ms [2.139 ms, 2.18 ms] 925.242 µs (75.0%)
tracing 1.872 ms [1.856 ms, 1.887 ms] 637.583 µs (51.7%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.217 ms [1.207 ms, 1.228 ms] -
iast 3.361 ms [3.317 ms, 3.406 ms] 2.144 ms (176.1%)
iast_FULL 5.886 ms [5.828 ms, 5.945 ms] 4.669 ms (383.5%)
iast_GLOBAL 3.599 ms [3.544 ms, 3.654 ms] 2.382 ms (195.6%)
profiling 2.094 ms [2.076 ms, 2.112 ms] 876.495 µs (72.0%)
tracing 1.856 ms [1.84 ms, 1.873 ms] 638.782 µs (52.5%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.196 ms) : 19001, 19391
.   : milestone, 19196,
appsec (18.655 ms) : 18466, 18845
.   : milestone, 18655,
code_origins (17.903 ms) : 17726, 18081
.   : milestone, 17903,
iast (18.397 ms) : 18211, 18584
.   : milestone, 18397,
profiling (19.219 ms) : 19028, 19410
.   : milestone, 19219,
tracing (18.04 ms) : 17863, 18218
.   : milestone, 18040,
section candidate
no_agent (19.31 ms) : 19115, 19506
.   : milestone, 19310,
appsec (18.903 ms) : 18708, 19098
.   : milestone, 18903,
code_origins (18.019 ms) : 17841, 18197
.   : milestone, 18019,
iast (19.245 ms) : 19054, 19437
.   : milestone, 19245,
profiling (18.781 ms) : 18593, 18969
.   : milestone, 18781,
tracing (18.845 ms) : 18656, 19034
.   : milestone, 18845,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.196 ms [19.001 ms, 19.391 ms] -
appsec 18.655 ms [18.466 ms, 18.845 ms] -540.582 µs (-2.8%)
code_origins 17.903 ms [17.726 ms, 18.081 ms] -1.293 ms (-6.7%)
iast 18.397 ms [18.211 ms, 18.584 ms] -798.791 µs (-4.2%)
profiling 19.219 ms [19.028 ms, 19.41 ms] 23.11 µs (0.1%)
tracing 18.04 ms [17.863 ms, 18.218 ms] -1.156 ms (-6.0%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.31 ms [19.115 ms, 19.506 ms] -
appsec 18.903 ms [18.708 ms, 19.098 ms] -406.694 µs (-2.1%)
code_origins 18.019 ms [17.841 ms, 18.197 ms] -1.291 ms (-6.7%)
iast 19.245 ms [19.054 ms, 19.437 ms] -64.667 µs (-0.3%)
profiling 18.781 ms [18.593 ms, 18.969 ms] -528.883 µs (-2.7%)
tracing 18.845 ms [18.656 ms, 19.034 ms] -465.417 µs (-2.4%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master ygree/llmobs-systest-fixes
git_commit_date 1773939812 1774929373
git_commit_sha 5580c61 d7d4866
release_version 1.61.0-SNAPSHOT~5580c61ac4 1.60.0-SNAPSHOT~d7d4866358
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1774931359 1774931359
ci_job_id 1553021999 1553021999
ci_pipeline_id 105184639 105184639
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-rhiksb9l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-rhiksb9l 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.478 ms) : 1467, 1490
.   : milestone, 1478,
appsec (3.775 ms) : 3554, 3996
.   : milestone, 3775,
iast (2.26 ms) : 2191, 2329
.   : milestone, 2260,
iast_GLOBAL (2.306 ms) : 2236, 2376
.   : milestone, 2306,
profiling (2.081 ms) : 2026, 2135
.   : milestone, 2081,
tracing (2.088 ms) : 2034, 2142
.   : milestone, 2088,
section candidate
no_agent (1.484 ms) : 1473, 1496
.   : milestone, 1484,
appsec (3.803 ms) : 3581, 4024
.   : milestone, 3803,
iast (2.258 ms) : 2189, 2327
.   : milestone, 2258,
iast_GLOBAL (2.307 ms) : 2237, 2376
.   : milestone, 2307,
profiling (2.101 ms) : 2046, 2157
.   : milestone, 2101,
tracing (2.068 ms) : 2014, 2121
.   : milestone, 2068,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.478 ms [1.467 ms, 1.49 ms] -
appsec 3.775 ms [3.554 ms, 3.996 ms] 2.296 ms (155.4%)
iast 2.26 ms [2.191 ms, 2.329 ms] 781.831 µs (52.9%)
iast_GLOBAL 2.306 ms [2.236 ms, 2.376 ms] 827.764 µs (56.0%)
profiling 2.081 ms [2.026 ms, 2.135 ms] 602.774 µs (40.8%)
tracing 2.088 ms [2.034 ms, 2.142 ms] 609.68 µs (41.2%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.484 ms [1.473 ms, 1.496 ms] -
appsec 3.803 ms [3.581 ms, 4.024 ms] 2.318 ms (156.2%)
iast 2.258 ms [2.189 ms, 2.327 ms] 773.954 µs (52.1%)
iast_GLOBAL 2.307 ms [2.237 ms, 2.376 ms] 822.457 µs (55.4%)
profiling 2.101 ms [2.046 ms, 2.157 ms] 617.153 µs (41.6%)
tracing 2.068 ms [2.014 ms, 2.121 ms] 583.474 µs (39.3%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~d7d4866358, baseline=1.61.0-SNAPSHOT~5580c61ac4
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.88 s) : 14880000, 14880000
.   : milestone, 14880000,
appsec (14.608 s) : 14608000, 14608000
.   : milestone, 14608000,
iast (18.083 s) : 18083000, 18083000
.   : milestone, 18083000,
iast_GLOBAL (18.14 s) : 18140000, 18140000
.   : milestone, 18140000,
profiling (15.717 s) : 15717000, 15717000
.   : milestone, 15717000,
tracing (14.988 s) : 14988000, 14988000
.   : milestone, 14988000,
section candidate
no_agent (15.407 s) : 15407000, 15407000
.   : milestone, 15407000,
appsec (14.593 s) : 14593000, 14593000
.   : milestone, 14593000,
iast (18.251 s) : 18251000, 18251000
.   : milestone, 18251000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
.   : milestone, 17785000,
profiling (15.464 s) : 15464000, 15464000
.   : milestone, 15464000,
tracing (15.094 s) : 15094000, 15094000
.   : milestone, 15094000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.88 s [14.88 s, 14.88 s] -
appsec 14.608 s [14.608 s, 14.608 s] -272.0 ms (-1.8%)
iast 18.083 s [18.083 s, 18.083 s] 3.203 s (21.5%)
iast_GLOBAL 18.14 s [18.14 s, 18.14 s] 3.26 s (21.9%)
profiling 15.717 s [15.717 s, 15.717 s] 837.0 ms (5.6%)
tracing 14.988 s [14.988 s, 14.988 s] 108.0 ms (0.7%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.407 s [15.407 s, 15.407 s] -
appsec 14.593 s [14.593 s, 14.593 s] -814.0 ms (-5.3%)
iast 18.251 s [18.251 s, 18.251 s] 2.844 s (18.5%)
iast_GLOBAL 17.785 s [17.785 s, 17.785 s] 2.378 s (15.4%)
profiling 15.464 s [15.464 s, 15.464 s] 57.0 ms (0.4%)
tracing 15.094 s [15.094 s, 15.094 s] -313.0 ms (-2.0%)

@ygree ygree force-pushed the ygree/llmobs-systest-fixes branch from 5cd257e to cbd6226 Compare February 24, 2026 09:31
@ygree ygree changed the title llmobs: set model tag even when llmobs disabled fix(llmobs): set model tag even when llmobs disabled Mar 2, 2026
ygree added 23 commits March 2, 2026 13:30
…wthTestOpenAiLlmInteractions::test_completion
…d with python openai instrumentation and system-tests
… with variables + chat_template, longest-first overlap handling) and support map-based LLM input serialization (messages + prompt) in LLMObs mapper. Also filter empty instruction messages to match system-test expectations.
…st and return [image] (not empty) when stripped input_image URLs are missing, aligning mixed-input chat_template output with expected behavior.
…output.messages from request params so existing error-span tests pass.
…JSON argument parsing and remove duplicate manual parsing logic from ResponseDecorator.
Copy link
Copy Markdown
Member

@Kyle-Verhoog Kyle-Verhoog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMObs Team Review

Nice work aligning the Java SDK payloads with the intake schema — this is a big step for system test compliance. A few items to address/clarify below (inline), plus some overall notes:

Test Coverage Notes

What's well-covered: LLMObsSpanMapperTest expansion is great — covers _dd map, nested meta.error, map-based input with prompt/chat_template, tool definitions, tool calls + tool results. The decorator tests verify the new tags (source, integration, error, ddtrace.version).

Gaps to consider:

  • Error paths: No test exercises the error-path defaults (model_name and empty output set during withResponseCreateParams when the HTTP call fails). A test where the response errors out and verifying the span still has model_name and placeholder output would be valuable.
  • Prompt tracking: enrichInputWithPromptTracking(), extractChatTemplate(), extractPromptFromParams(), and normalizePromptVariable() have no unit tests. Template variable replacement edge cases (overlapping values, empty variables, image/file fallbacks) would increase confidence.
  • Custom/MCP tool calls: ToolCallExtractor.getToolCall(ResponseCustomToolCall) and getToolCall(McpCall) are new with no unit tests.
  • JsonValueUtils: New utility class with no dedicated tests for recursive JSON-to-Object conversion.

Questions

  1. The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.
  2. For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).


boolean errored = span.getError() == 1;
writable.writeUTF8(STATUS);
writable.writeString(span.getError() == 0 ? "ok" : "error", null);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level error: 0/1 integer field has been removed and replaced with status: "ok"/"error" + error details nested under meta.error. Can you confirm no downstream consumers (EvP remapper, indexer facets, etc.) read error from the top level? This is a payload shape change that could be breaking if anything depends on the old field.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is dictated by the TestOpenAiLlmInteractions::test_chat_completion assertion. I assume that the system test assertions are correct. Have they been verified as being compliant with the requirements of downstream consumers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I leave the top-level error field, the system test will fail.

@ygree ygree force-pushed the ygree/llmobs-systest-fixes branch from 6dcdaf4 to 717a8f0 Compare March 24, 2026 18:43
apply from: "$rootDir/gradle/java.gradle"

def minVer = '3.0.0'
def minVer = '3.0.1'
Copy link
Copy Markdown
Contributor Author

@ygree ygree Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResponseTextConfig fun verbosity(): Optional<Verbosity> was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

@ygree
Copy link
Copy Markdown
Contributor Author

ygree commented Mar 24, 2026

Questions

  1. The min version bump from 3.0.0 to 3.0.1 — what API was missing in 3.0.0? This affects which customer versions get instrumented.

ResponseTextConfig fun verbosity(): Optional was added in 3.0.1 openai/openai-java@c1de354#diff-6b385fb153d457757ba112e6117593cb59da6af308cce0f9b6f26e3885befc6cR73

  1. For the _dd map — does the intake expect apm_trace_id to equal trace_id? In other SDKs these can differ (APM trace ID vs LLMObs ID).

This is aligned with dd-trace-py https://github.com/DataDog/dd-trace-py/blob/876c5f1ce4d173815537798a6a7b0ac15b0a4ede/ddtrace/llmobs/_llmobs.py#L618-L622.

@ygree ygree requested a review from a team as a code owner March 26, 2026 04:08
@ygree ygree added this to the 1.61.0 milestone Mar 26, 2026
@ygree ygree requested a review from Kyle-Verhoog March 26, 2026 18:13
Copy link
Copy Markdown
Contributor

@amarziali amarziali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apm-java has just the TagAssert file concerned. So overall delegating to llmops / idm the review

@ygree
Copy link
Copy Markdown
Contributor Author

ygree commented Apr 1, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 bot commented Apr 1, 2026

View all feedbacks in Devflow UI.

2026-04-01 16:55:21 UTC ℹ️ Start processing command /merge


2026-04-01 16:55:26 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in master is approximately 2h (p90).


2026-04-01 17:52:55 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot merged commit e307e2c into master Apr 1, 2026
566 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d bot deleted the ygree/llmobs-systest-fixes branch April 1, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: mlobs ML Observability (LLMObs) tag: ai generated Largely based on code generated by an AI or LLM type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants